Project - Bank Churn Prediction


Context:

Objective:

Data Information

Customer Details


Table of Contents (TOC)

- Importing Packages
- Unwrapping Customer Information
- Data Pre-Processing & Sanity Checks
- Summary of Data Analysis
- EDA Analysis
- Model Building
- Model Analysis
- Model Performance Comparison
- Recommendations

Importing required Packages:

Click to return to TOC



Unwrapping the Customer Information:

Click to return to TOC


Data Description: Click to return to TOC


Data Preprocessing & Sanity Checks

Click to return to TOC


Observations:

Dropping the Customer ID Column

Checking for Duplicates

Checking for Columns with missing values

Observations:

Validating the values of the columns to observe the pattern and data correctness

Observations:

Click to return to TOC

Inferences:


Summary of Data Analysis

Click to return to TOC

Data Structure:

Data Cleaning:

Data Insight:

For more data information details, refer comments in Data descriptions & Feature Value observations


Common Functions


Data Description Post Treament

Click to return to TOC


EDA Analysis - Analyzing respective attributes to understand the data pattern

Click to return to TOC


Analyzing the count and percentage of Categorical attributes using a bar chart

Insights from Categorical Data

Click to return to TOC

Observations:


Analyzing the Numerical attributes using Histogram and Box Plots

Insights from Numerical Data

Click to return to TOC

Observations:


Univariate Analysis

Click to return to TOC


Analyzing the Balance of the Customers

Aalyzing the Credit Score of the Customers

Observations:

Analyzing the Age of the bank customers

Observations:

Analyzing the EstimatedSalary

Observations:


Bivariate Analysis

Click to return to TOC


Visualise variables association with Exit parameter & its correlation

Analyzing the Categorial attributes with Exit Flag

Observation:

Click to return to TOC

Exited vs Geography

Exited vs Gender

Exited vs Tenure

Exited vs NumOfProducts

Exited vs HasCrCard

Exited vs IsActiveMember

Exited vs Age Group

Exited vs EstimatedSalary_Grp

Analyzing the Numerical attributes

Observation:

Click to return to TOC

Exited vs Credit Score

Exited vs Age

Exited vs Balance

Exited vs Estimated Salary


Multivariate Analysis - Visualise association with Product Taken & correlation between other Features

Click to return to TOC


Observations:

Click to return to TOC

Age_Grp vs EstimatedSalary_Grp

Observations:

Age Group vs Geography

Observations:

Balance vs Number of Products vs Exited

Observations:

HasCrCard vs Balance vs Exited

Observations:

Age Group vs Balance vs Exited

Observations:

CreditScore_Grp vs Tenure vs Balance vs Exited

Observations:

NumOfProducts vs EstimatedSalary vs EstimatedSalary vs Exited

Observations:

NumOfProducts vs IsActiveMember vs Age vs Exited

Observations:

CreditScore vs NumOfProducts vs Gender

Observations


Model Building

Click to return to TOC

Data Preparation for Modeling


Split Data


Building the model

Click to return to TOC

Model evaluation criterion:

The model can make wrong predictions as:

  1. Predicting that the customer will stay with the bank but customer leaves the bank services - Loss of resources
  2. Predicting a customer will leave the bank but the customer doesn't leave - Loss of opportunity

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?


Model Analysis

Click to return to TOC

Model-1 - Basic Modeling

Click to return to TOC

Observations


Model-2 - Basic ReLu with more Layers & Early Stopping Tuning

Click to return to TOC

Observations


Model-3 - SGD with Relu Function, Dropout & BatchNorm. Tuning

Click to return to TOC

Observations


Model-4 - Adagrad with TanH Function, Dropout & BatchNorm. Tuning

Click to return to TOC

Observations


Model-5 - RMS Prop with Relu function, Dropout & Batch Norm.

Click to return to TOC

Observations


Model-6 - Adam with TanH Function, Dropout & BatchNorm. Tuning

Click to return to TOC

Observations


Model-7 - Adam with ReLu Function, Dropout & BatchNorm. Tuning¶

Click to return to TOC

Observations


Comparing the Training vs Testing data

Model Performance Comparison:

Click to return to TOC

Analyzing the performance of each of the models,

Based on the comparisons,

Model 4 seems to have the best recall scores, accuracy & F1 scores with less overfitting, followed by Model 7. Model 6 has good scores too, but it is overfitting.


Recommendations:

Click to return to TOC

Based on the Customer Information:

Based on the data patterns of the Customers, we found the following insights that can be leveraged as recommendations for understanding the Customers:


Click to return to TOC